Nonlinear overlap method for time scaling

ABSTRACT

A nonlinear overlap method for time scaling to synthesize an S 1 [n] and an S 2 [n] into an S 3 [n] is disclosed. The S 1 [n] and the S 2 [n] having N 1  and N 2  signals respectively. The nonlinear overlap method includes the following steps: (a) delaying the S 2 [n] by a predetermined number and forming an S 5 [n], (b) establishing a correlogram of a cross-correlation function of the S 1 [n] and S 5 [n], and (c) setting S 3 [n] as a number of S 1 [n] when  0 &lt;=n&lt;; as a number formed by overlap-adding the S 1 [n] and an S 4 [n] in a weighting manner when (the predetermined number+the maximum index+the first threshold)&lt;=n&lt;(N 1 −a second threshold); and as a number of S 4  wherein the first and second thresholds are not equal to zero at the same time, and the S 4 [n] is formed by delaying the S 5 [n] by the maximum index.

BACKGROUND OF INVENTION

1. Field of the Invention

The present invention relates to a signal-synthesizing method, and moreparticularly, to a nonlinear overlap method for time scaling.

2. Description of the Prior Art

Due to the dramatic progress in electronic technologies, an AV playersuch as a Karaoke can provide more and more amazing functions, such asaudio clean-up, dynamic repositioning of enhanced audio and music(DREAM), and time scaling. Time scaling (also called time stretching,time compression/expansion, or time correction) is a function toelongate or shorten an audio signal while keeping the pitch of the audiosignal approximately unchanged. In short, time scaling only adjusts thetempo of an audio signal.

In general, an AV player performs time scaling with one of the threefollowing methods: Phase Vocoder, Minimum Perceived Loss TimeExpansion/Compression (MPEX), and Time Domain Harmonic Scaling (TDHS).Phase Vocoder transforms an audio signal into a complex Fourierrepresentation signal with Short Time Fourier Transform (STFT) andfurther transforms the complex Fourier representation signal back to atime scaled audio signal corresponding to the original audio signal withinterpolation techniques and iSTFT (inverse STFT). MPEX is a methodresearched and developed by Prosoniq for simulating characteristics ofhuman hearing, similar to an artificial neural network. MPEX recordsaudio signals received for a predetermined period and tries to “learn”the audio signals, so as to either elongate or shorten the audiosignals. TDHS is one of the most popular methods for time scaling. TDHSfirst establishes an autocorrelogram of a first audio signal, theautocorrelogram consisting of a plurality of magnitudes, and then delaysthe first audio signal by a maximum index corresponding to a maximummagnitude, a largest magnitude among all of the magnitudes of theautocorrelogram, to form a second audio signal, and lastly synchronizesand overlap-adds (SOLA) the first audio signal to the second audiosignal to form a third audio signal longer than the first audio signal.

In a computer system, the autocorrelogram is usually established by adigital signal processing (DSP) chip designed to manage complexmathematic calculation such as convolution and fast Fourier transform(FFT). However, a process by the DSP chip to synthesize the third audiosignal from the first and second audio signals is tedious and sometimesunnecessary.

SUMMARY OF INVENTION

It is therefore a primary objective of the claimed invention to providea nonlinear overlap method for time scaling to efficiently synthesize athird audio signal from a first audio signal and a second audio signalwithout sacrificing the quality of the third audio signal dramatically.

According to the claimed invention, the nonlinear overlap method fortime scaling to synthesize an S₃[n] signal from an S₁[n] signal and anS₂[n] signal, the S₁[n] signal having N₁ elements and the S₂[n] signalhaving N₂ elements, comprises:

(a)delaying the S₂[n] signal by a predetermined number of elements andforming an S₅[n] signal;

(b)establishing a cross-correlogram of a cross-correlation function ofthe S₁[n] signal and the S₅[n] signal, the cross-correlogram including aplurality of magnitudes, each of the magnitudes corresponding to anindex; and

(c)setting the S₃[n] signal as values of the elements of:

S₁[n], where 0<=n<(the predetermined number+a first threshold value+amaximum index), the maximum index corresponding to a largest magnitudeamong all of the magnitudes of the cross correlogram;

S₁[n] weights and adds to an S₄[n] signal that lags the S₅[n] signal bythe maximum index, where (the predetermined number+the first thresholdvalue+the maximum index)<=n<(N₁−a second threshold value); and

S₄[n (the predetermined number+the maximum index)], where (N₁−the secondthreshold value)<=n<=(N₂+predetermined number+the maximum index);

wherein the first and second threshold values are not equal to zero atthe same time.

It is an advantage of the claimed invention that the method calculatesvalues between the first threshold and the second threshold instead ofall values of the overlapped signal from A to Z to save time for a DSPchip to synthesize the S₃[n] signal from the S₁[n] and S₂[n] signals andpromote a computer where the DSP chip is installed in.

These and other objectives of the claimed invention will no doubt becomeobvious to those of ordinary skill in the art after reading thefollowing detailed description of the preferred embodiment that isillustrated in the various figures and drawings.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a flow chart of a method according to the present invention.

FIG. 2 is a schematic diagram demonstrating how the method synthesizesan S₃[n] signal from an S₁[n] signal and an S₂[n] signal according tothe present invention.

FIG. 3 is a schematic diagram demonstrating how the method elongates anaudio signal according to the present invention.

FIG. 4 is a schematic diagram demonstrating how the method shortens anaudio signal according to the present invention.

DETAILED DESCRIPTION

After establishing an autocorrelogram corresponding to a first audiosignal and a second audio signal (or a signal lagging the first audiosignal by a predetermined number), the autocorrelogram consisting of aplurality of magnitudes, a method 100 of the preferred embodiment of thepresent invention determines a maximum index corresponding to a maximummagnitude, a largest magnitude in the autocorrelogram, and calculates athird audio signal according to the first audio signal, the second audiosignal, the maximum index, a first threshold and a second threshold. Indetail, in order to save time for a digital signal processing (DSP) chipto synthesize the third audio signal from the first and second audiosignals, the method 100, having determined the maximum index anddelaying the second audio signal by the maximum index, does not weightand add all of an overlapped signal mixed with the first audio signaland the second audio signal as well to the second audio signal butweights and adds part (a region between the first threshold and thesecond threshold) of the overlapped signal to the second audio signalinstead and forms the third audio signal.

Please refer to FIG. 1, which is a flow chart of a method 100 of thepreferred embodiment according to the present invention. The method 100comprises the following steps:

Step 102: Start;

(An S₃[n] signal is to be synthesized from an S₁[n] signal and an S₂[n]signal. For simplicity, the S₁[n] signal and S₂[n] signals are definedto contain N₁ and N₂ signals respectively.)

Step 104: Delaying the S₂[n] signal by a predetermined number Δ andforming an S₅[n] signal;

(In order to prevent run-in from occurring in a process a pickup of anA/V player reads the S₃[n] signal, the method 100 delays the S₂[n]signal by the predetermined number Δ then determines an maximum indexτ_(max) crucial for the process to synthesize the S₃[n] signal from theS₁[n] signal and the S₂[n] signal. In the preferred embodiment, thepredetermined number Δ is equal to [N/3].)

Step 106: Establishing an autocorrelogram of the S₁[n] and S₅[n] signalsand delaying the S₅[n] signal to form an S₄[n] signal according to themaximum index τ_(max) corresponding to a maximum magnitude in theautocorrelogram;

(The autocorrelogram comprises a plurality of magnitudes of across-correlation function, each of the magnitudes corresponding to adistinct index.)

Step 108: Synthesizing the S₃[n] signal from the S₁[n] signal and theS₄[n]signal;

(The S₃[n] signal is equal to

the S₁[n] signal, where 0<=n<(the predetermined number Δ+a firstthreshold value th₁+the maximum index τ_(max));

the S₁[n] signal weights and adds to the S₄[n] signal, where (thepredetermined number Δ+the first threshold value th₁+the maximum indexτ_(max))<=n<(N₁ a second threshold value th₂); and

the S₄[n] (the predetermined number Δ+the maximum index τ_(max))]signal, where (N₁−the second threshold value th₂)<=n <=(N₂+thepredetermined number Δ+the maximum index τ_(max));

wherein the first threshold value th and second threshold value th₂ arenot equal to zero at the same time.)

Step 110: End.

Please refer to FIG. 2, which is a schematic diagram demonstrating howthe method 100 synthesizes the S₃[n] signal from the S₁[n] and S₂[n]signals according to the present invention. In FIG. 2, a first part 401shows the S₁[n] and S₂[n] signals in the step 102 of the method 100, asecond part 402 shows the S₁[n] and S₅[n] signals calculated from thestep 104 of the method 100, a third part 403 shows the maximum indexτ_(max) the S₄[n] signal calculated from the step 106 of the method 100,a fourth part 404 and a fifth part 405 the S₃[n] signal synthesized fromthe S₁[n] and the S₄[n] signals in the step 108 of the method 100.

The S₃[n] signal shown in the fourth part 404 of FIG. 2 is equal to

${\frac{\left( {N_{1} - {th}_{2} - n} \right)}{\left( {N_{1} - \left( {\Delta + \tau_{\max} + {th}_{1} + {th}_{2}} \right)} \right)}*{S_{1}\lbrack n\rbrack}} + {\frac{n - \left( {\Delta + {th}_{1} + \tau_{\max}} \right)}{\left( {N_{1} - \left( {\Delta + \tau_{\max} + {th}_{1} + {th}_{2}} \right)} \right)}*{S_{4}\left\lbrack {n - \left( {{\Delta + \tau_{\max}},} \right.} \right.}}$where (the predetermined number Δ+the maximum index τ_(max)+the firstthreshold value th₁)<=n<(N₁ the second threshold value th₂).

The S₃[n] signal shown in the fourth part 405 of FIG. 2 is equal to

${\frac{\left( {N_{1} - n} \right)}{\left( {N_{1} - \left( {\Delta + \tau_{\max}} \right)} \right)}*{S_{1}\lbrack n\rbrack}} + {\frac{n - \left( {\Delta + \tau_{\max}} \right)}{\left( {N_{1} - \left( {\Delta + \tau_{\max}} \right)} \right)}*{S_{4}\left\lbrack {n - \left( {{\Delta + \tau_{\max}},} \right.} \right.}}$where (the predetermined number Δ+the maximum index τ_(max)+the firstthreshold value th₁)<=n<(N₁ the second value th₂).

If the S₁[n] signal is the same as the S₂[n] signal and both are derivedfrom the S[n] at an identical region, as shown on FIG. 3, the method 100in fact elongates the S₁[n]. On the contrary, if the S₁[n] signal andthe S₂[n] signals are different from each other and are derived from theS[n] at two distinct regions respectively, as shown in FIG. 4, themethod 100 in fact shortens the S₁[n], an S₆[n] (discarded) and theS₂[n] signals into the S₃[n] signal.

In contrast to the prior art, the present invention can provide a methodto synthesize the S₃[n] signal from the S₁[n] and S₂[n] signals based onthe maximum index corresponding to the maximum magnitude of theautocorrelogram and the first and second threshold values for confiningthe overlapped signal simultaneously mixed with the S₁[n] and the S₂[n]signals. Instead of calculating all values of the overlapped signal fromA to Z, the method calculates values between the first threshold and thesecond threshold to save time for a DSP chip to synthesize the S₃[n]signal from the S₁[n] and S₂[n] signals and promote a computer where theDSP chip is installed in.

Following the detailed description of the present invention above, thoseskilled in the art will readily observe that numerous modifications andalterations of the device may be made while retaining the teachings ofthe invention. Accordingly, the above disclosure should be construed aslimited only by the metes and bounds of the appended claims.

1. A nonlinear overlap method for time scaling to synthesize an S₃[n]signal from an S₁[n] signal and an S₂[n] signal, the S₁[n] signal havingN₁ elements and the S₂[n] signal having N₂ elements, the methodcomprising: (a) delaying the S₂[n] signal by a predetermined number ofelements and forming an S₅[n] signal; (b) establishing across-correlogram of a cross-correlation function of the S₁[n] signaland the S₅[n] signal, the cross-correlogram including a plurality ofmagnitudes, each of the magnitudes corresponding to an index; and (c)setting the S₃[n] signal as values of the elements of: S₁[n], where0<=n<(the predetermined number+a first threshold value+a maximum index),the maximum index corresponding a largest magnitude among all of themagnitudes of the cross-corrolegram; S₁[n] weighted and added to anS₄[n] signal that lags the S₅[n] signal by the maximum index, where (thepredetermined number+the first threshold value+the maximum index)<=n<(N₁a second threshold value); and S₄[n−(the predetermined number+themaximum index)], where (N₁−the second threshold value)<=n<=(N₂+thepredetermined number+the maximum index); wherein the first and secondthreshold values are not equal to zero at the same time.
 2. The methodof claim 1 wherein the S₃[n] signal is equal to (N₁−the second thresholdvalue−n)/(N₁−(the predetermined number+the maximum index+the firstthreshold value+the second threshold value))*S₁[n]+(n−(the predeterminednumber+the maximum index+the first threshold value))/(N₁−(thepredetermined number+the maximum index+the first threshold value+thesecond threshold value))*S₄[n−(the predetermined number+the maximumindex)] while (the predetermined number+the maximum index+the firstthreshold value)<=n<(N₁−the second threshold value).
 3. The method ofclaim 1 wherein the S₃[n] signal is equal to (N₁−n)/(N₁−(thepredetermined number+the maximum index))*S₁[n]+(n−(the predeterminednumber+the maximum index))/(N₁−(the predetermined number+the maximumindex))*S₄[n−(the predetermined number+the maximum index)].
 4. Themethod of claim 1 wherein the S₁[n] signal and the S₂[n] signal aresampled from an S₁(t) signal and an S₂(t) signal respectively.
 5. Themethod of claim 4 wherein the S₁(t) signal and the S₂(t) signal are bothderived from an original signal.
 6. The method of claim 5 wherein theoriginal signal is an audio signal.
 7. The method of claim 5 wherein theoriginal signal is a video signal.
 8. The method of claim 4 wherein theS₁(t) signal and the S₂(t) signal are identical.
 9. The method of claim4 wherein the S₁(t) signal and the S₂(t) signal are different from eachother.
 10. The method of claim 1 wherein the predetermined number isequal to [N₁/3].
 11. A nonlinear overlap method for time scaling tosynthesize an S₃[n] signal from an S₁[n] signal and an S₂[n] signal, theS₁[n] signal having N₁ elements and the S₂[n] signal having N₂ elements,the method comprising: (a) establishing a cross-correlogram of across-correlation function of the S₁[n] signal and the S₂[n] signal, thecross-correlogram including a plurality of magnitudes, each of themagnitudes corresponding to an index; and (b) setting the S₃[n] signalas values of the elements of: S₁[n], where 0<=n<(a first thresholdvalue+a maximum index), the maximum index corresponding a largestmagnitude among all of the magnitudes of the cross-corrolegram; S₁[n]weighted and added to an S₄[n] signal that lags the S₂[n] signal by themaximum index, where (the first threshold value+the maximumindex)<=n<(N₁−a second threshold value); and S₄[n−the maximum index],where (N₁−the second threshold value)<=n<=(N₂+the maximum index);wherein the first and second threshold values are not equal to zero atthe same time.
 12. The method of claim 11 wherein the S₃[n] signal isequal to (N₁−the second threshold value−n)/(N₁−(the maximum index+thefirst threshold value+the second threshold value))*S₁[n]+(n−(the maximumindex+the first threshold vlaue))/(N₁−(the maximum index+the firstthreshold value+the second threshold value))*S₄[n−(the maximum index)]while (the maximum index+the first threshold value)<=n<(N−the secondthreshold value).
 13. The method of claim 11 wherein the S₃[n] signal isequal to (N₁−n)/(N₁−the maximum index)*S₁[n]+(n−the maximumindex)/(N−the maximum index)*S₄[n−the maximum index].
 14. The method ofclaim 11 wherein the S₁[n] signal and the S₂[n] signal are sampled froman S₁(t) signal and an S₂(t) signal respectively.
 15. The method ofclaim 14 wherein the S₁(t) signal and the S₂(t) signal are both derivedfrom an original signal.
 16. The method of claim 15 wherein the originalsignal is an audio signal.
 17. The method of claim 15 wherein theoriginal signal is a video signal.
 18. The method of claim 14 whereinthe S₁(t) signal and the S₂(t) signal are identical.
 19. The method ofclaim 14 wherein the S₁(t) signal and the S₂(t) signal are differentfrom each other.